Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add flag to prevent implicit removal of existing device/fs. #43

Merged
merged 7 commits into from
Nov 15, 2019

Conversation

dwlehman
Copy link
Collaborator

This flag, storage_safe_mode, will prevent the implicit/automatic removal of existing file systems or device stacks. In other words, with this flag set, nothing will be removed except pools and volumes whose state is defined as absent.

This is related to #42

defaults/main.yml Outdated Show resolved Hide resolved
@@ -3,6 +3,7 @@
storage_provider: "blivet"
storage_use_partitions: null
storage_disklabel_type: null # leave unset to allow the role to select an appropriate label type
storage_safe_mode: false # fail instead of implicitly/automatically removing devices or formatting
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the default be true? @tabowling what do you think?

@pcahyna
Copy link
Member

pcahyna commented Oct 3, 2019

@dwlehman it still silently destroys data in some situations. Consider a filesystem created with this playbook

- hosts: all
  roles:
    - name: linux-system-roles.storage
      storage_volumes:
        - name: barefs
          type: disk
          disks:
            - /dev/vdc
          fs_type: xfs
          mount_point: /mnt/barefs

and then changed with this playbook

- hosts: all
  roles:
    - name: linux-system-roles.storage
      storage_volumes:
        - name: barefs
          type: disk
          disks:
            - /dev/vdc
          fs_type: ext2
          mount_point: /mnt/barefs

(note the fs_type change), the filesystem is removed and recreated.

@dwlehman
Copy link
Collaborator Author

dwlehman commented Oct 3, 2019 via email

@pcahyna
Copy link
Member

pcahyna commented Oct 4, 2019

By the way, changing type from ext2 to ext3 is performed destructively as well, is it necessary? I would think that adding the journal inode and setting the HAS_JOURNAL flag should be easy.

@pcahyna
Copy link
Member

pcahyna commented Oct 10, 2019

@dwlehman Are the known problems fixed now with the latest commits and the PR ready from your side, please?

@dwlehman
Copy link
Collaborator Author

Yes, I think it's all set. FWIW there is a test case now for the last problem you found (in tests/tests_disk_errors.yml).

@dwlehman
Copy link
Collaborator Author

Switching from ext2->ext3 is no different from ext2->xfs as of now. I'm a big fan of consistency. We can talk about putting in a special case for ext2->ext3, but it's not related to this pull request IMO.

@pcahyna
Copy link
Member

pcahyna commented Oct 10, 2019

Concerning type switching, I found another surprise: if I omit fs_type:, the volume gets reformatted with the default type (xfs). Wouldn't it be more reasonable to keep the existing filesystem?
(EDIT I originally wrote type: but I meant fs_type:)

@dwlehman
Copy link
Collaborator Author

I'm not sure about that. I can see a case for either behavior. I have a somewhat difficult time imagining a user omitting fs_type and having a specific expectation about the result. Probably some people will complain whichever way we choose to interpret it.

@pcahyna
Copy link
Member

pcahyna commented Oct 11, 2019

The scenario I am imagining is that someone will take a storage configuration created before by some other tools (anaconda, manually...), already used for storing data, and they will start using the storage role to manage it - change its properties, most likely size. It can be unexpected to see the the storage role reformatting your volumes (or at least trying to remove them, with the changes in this PR) and recreating them just because you did specify only the size (that you want to change) and not the filesystem type (or any other attribute that should be left unchanged).

@@ -660,6 +673,7 @@ def run_module():
volumes=dict(type='list'),
packages_only=dict(type='bool', required=False, default=False),
disklabel_type=dict(type='str', required=False, default=None),
safe_mode=dict(type='bool', required=False, default=False),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be easy to set safe_mode individually per pool or volume?
My reasoning is that one may need to reformat a disk. One then needs to set storage_safe_mode=false globally, and if one is managing the complete storage configuration in one call to the role, a mistake in an unrelated part of the configuration variable can then have unintended destructive consequences. It would be then safer to specify safe_mode only on the disks one needs to reformat.
(The tricky part is how to specify safe_mode for physical disks, which are provided in the "disks" arrays.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is reformatting a disk different from removing a file system in terms of being "unsafe" or "destructive"? Perhaps you are talking about formatting an unformatted disk, which would not be prevented by storage_safe_mode.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am talking about formatting a disk which previously had some data on it (to be discarded on purpose). This should require storage_safe_mode=false. Setting it does then remove the protection against mistakes in other parts of the configuration.

@pcahyna
Copy link
Member

pcahyna commented Oct 11, 2019

Yes, I think it's all set. FWIW there is a test case now for the last problem you found (in tests/tests_disk_errors.yml).

Sorry, but it still removes data for me. Consider this playbook (very similar to the two that I pasted above)

- hosts: all
  roles:
    - name: linux-system-roles.storage
      vars:
        storage_safe_mode: true
        storage_volumes:
          - name: barefs
            type: disk
            disks:
              - /dev/vdc
            fs_type: xfs
            mount_point: /mnt/barefs
  tasks:
    - name: create a file
      file:
        path: /mnt/barefs/foo
        state: touch
- hosts: all
  roles:
    - name: linux-system-roles.storage
      vars:
        storage_safe_mode: true
        storage_volumes:
          - name: barefs
            type: disk
            disks:
              - /dev/vdc
            fs_type: ext2
            mount_point: /mnt/barefs
  tasks:
    - name: stat the file
      stat:
        path: /mnt/barefs/foo
      register: stat_r
    - name: assert file presence
      assert:
        that:
          stat_r.stat.isreg is defined and stat_r.stat.isreg

It should fail in the second role application, but in reality it fails only in the last assert that verifies that data have not been destroyed. I don't know how it is possible that your newly added test passes, but I see that you don't actually test for data not having been destroyed, only for role output.
EDIT: I think I know - in addition to tests being rather shallow, you wrap them in ignore_errors: yes !

that: "{{ blivet_output.failed and
blivet_output.msg|regex_search('cannot remove existing formatting on volume.*in safe mode') and
not blivet_output.changed }}"
msg: "Unexpected behavior w/ existing data on specified disks"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say that this test is grossly insufficient considering the seriousness of the condition that it checks. What if the role destroys data and then sets blivet_output according to expectations?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A more thorough test is in #46

disks: "{{ unused_disks }}"
present: false

ignore_errors: yes
Copy link
Member

@pcahyna pcahyna Oct 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dwlehman asserts will not do anything useful in a block that has ignore_errors: yes I am afraid
Also instead of ignoring errors, it would be better to actually check that the role fails at every invocation where it is supposed to, by catching the failure in rescue:, in addition to checking the output facts.

storage_pools:
- name: foo
disks: "{{ unused_disks }}"
type: partition
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not work.

TASK [storage : Apply defaults to pools and volumes [2/6]] *********************
task path: /home/pcahyna/linux-system-roles/storage/tasks/main-blivet.yml:36
fatal: [/home/pcahyna/linux-system-roles-testing/local/rhel-guest-image-7.7-200.x86_64.qcow2]: FAILED! => {"msg": "could not find 'volumes' key in iterated item {'state': 'present', 'type': 'partition', 'name': 'foo', 'disks': ['vdc']}"}

It is arguably a bug that one can not create an empty pool (partition or other). But until this is fixed, tests should take it into account.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moreover, even if I delete type: partition and add one volume in an effort to make it work, blivet refuses to create a VG on a disk which already contains a filesystem, even with safe mode off, so the test still fails:

TASK [storage : manage the pools and volumes to match the specified state] *****
task path: /home/pcahyna/linux-system-roles/storage/tasks/main-blivet.yml:102
fatal: [/home/pcahyna/linux-system-roles-testing/local/rhel-guest-image-7.7-200.x86_64.qcow2]: FAILED! => {"actions": [], "changed": false, "leaves": [], "mounts": [], "msg": "Failed to commit changes to disk", "packages": ["xfsprogs", "lvm2"], "pools": [], "volumes": []}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR #46 has the change to create one volume, but this still fails as in the previous comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moreover, even if I delete type: partition and add one volume in an effort to make it work, blivet refuses to create a VG on a disk which already contains a filesystem, even with safe mode off, so the test still fails:

TASK [storage : manage the pools and volumes to match the specified state] *****
task path: /home/pcahyna/linux-system-roles/storage/tasks/main-blivet.yml:102
fatal: [/home/pcahyna/linux-system-roles-testing/local/rhel-guest-image-7.7-200.x86_64.qcow2]: FAILED! => {"actions": [], "changed": false, "leaves": [], "mounts": [], "msg": "Failed to commit changes to disk", "packages": ["xfsprogs", "lvm2"], "pools": [], "volumes": []}

Sorry, I think it was because I forgot to specify size for the volume. The error message is very misleading though

storage_pools:
- name: foo
disks: "{{ unused_disks }}"
type: partition
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears that type: partition is not implemented.

TASK [storage : get required packages] *****************************************
task path: /home/pcahyna/linux-system-roles/storage/tasks/main-blivet.yml:88
fatal: [/home/pcahyna/linux-system-roles-testing/local/rhel-guest-image-7.7-200.x86_64.qcow2]: FAILED! => {"actions": [], "changed": false, "leaves": [], "mounts": [], "msg": "Pool 'foo' has unknown type 'partition'", "packages": [], "pools": [], "volumes": []}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A fix for the test to not use partition for now is in #46

@pcahyna
Copy link
Member

pcahyna commented Oct 15, 2019

@dwlehman the last changes are definitely an improvement. Changing the filesystem type does not destroy data anymore. Here is the error message:

TASK [storage : get required packages] *****************************************
task path: /home/pcahyna/linux-system-roles/storage/tasks/main-blivet.yml:88
fatal: [/home/pcahyna/linux-system-roles-testing/local/rhel-guest-image-7.7-200.x86_64.qcow2]: FAILED! => {"actions": [], "changed": false, "leaves": [], "mounts": [], "msg": "cannot remove existing formatting on volume 'test1' in safe mode", "packages": [], "pools": [], "volumes": []}

The assert

- name: Verify the output
  assert:
    that: "blivet_output.failed and
           blivet_output.msg|regex_search('cannot remove existing formatting on volume.*in safe mode') and
              not blivet_output.changed"
    msg: "Unexpected behavior w/ existing data on specified disks"

still fails though, because blivet_output does not indicate failure (despite the actual failure):

    "blivet_output": {
        "actions": [
            {
                "action": "create format",
                "device": "/dev/vdc",
                "fs_type": "ext4"
            }
        ],
        "changed": true,
        "failed": false,
        "leaves": [
            "/dev/sr0",
            "/dev/vda1",
            "/dev/vdb",
            "/dev/vdc"
        ],
        "mounts": [
            {
                "dump": 0,
                "fstype": "ext4",
                "opts": "defaults",
                "passno": 0,
                "path": "/opt/test1",
                "src": "/dev/vdc",
                "state": "mounted"
            }
        ],
        "packages": [
            "xfsprogs",
            "e2fsprogs"
        ],
        "pools": [],
        "volumes": [
            {
                "_device": "/dev/vdc",
                "_mount_id": "/dev/vdc",
                "disks": [
                    "vdc"
                ],
                "fs_create_options": "",
                "fs_label": "",
                "fs_overwrite_existing": true,
                "fs_type": "ext4",
                "mount_check": 0,
                "mount_device_identifier": "uuid",
                "mount_options": "defaults",
                "mount_passno": 0,
                "mount_point": "/opt/test1",
                "name": "test1",
                "size": 0,
                "state": "present",
                "type": "disk"
            }
        ]
    }

@pcahyna
Copy link
Member

pcahyna commented Oct 15, 2019

tests improvements are in PR #46

@dwlehman
Copy link
Collaborator Author

@pcahyna I'm done until Tuesday. It looks good to me except for #48 which I do not think will be fixed here anyway.

@pcahyna
Copy link
Member

pcahyna commented Oct 21, 2019

@dwlehman #48 is subject to further discussion, but what about #50? That one looks like a real bug. (I am not saying that it is necessary to fix it in this PR, but it should not be forgotten.)

@pcahyna
Copy link
Member

pcahyna commented Oct 21, 2019

@dwlehman, more seriously, CI now fails in all the centos7/rhel7 runs, and always in the same place, with a Python TypeError, so it is not a CI infrastructure problem.

@pcahyna
Copy link
Member

pcahyna commented Oct 21, 2019

@dwlehman, more seriously, CI now fails in all the centos7/rhel7 runs, and always in the same place, with a Python TypeError, so it is not a CI infrastructure problem.

Hopefully corrected in #51

vars:
storage_volumes:
- name: test1
type: disk
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dwlehman Why do we have type: disk in cleanup when we had type: partition in the previous task? Cleanup does not seem to happen properly.
Just replacing disk by partition does not work either though:
... Traceback (most recent call last):\r\n File \"/root/.ansible/tmp/ansible-tmp-1571753120.2970297-255844982294535/AnsiballZ_blivet.py\", line 114, in <module>\r\n _ansiballz_main()\r\n File \"/root/.ansible/tmp/ansible-tmp-1571753120.2970297-255844982294535/AnsiballZ_blivet.py\", line 106, in _ansiballz_main\r\n invoke_module(zipped_mod, temp_path, ANSIBALLZ_PARAMS)\r\n File \"/root/.ansible/tmp/ansible-tmp-1571753120.2970297-255844982294535/AnsiballZ_blivet.py\", line 49, in invoke_module\r\n imp.load_module('__main__', mod, module, MOD_DESC)\r\n File \"/tmp/ansible_blivet_payload_ADJzqt/__main__.py\", line 784, in <module>\r\n File \"/tmp/ansible_blivet_payload_ADJzqt/__main__.py\", line 781, in main\r\n File \"/tmp/ansible_blivet_payload_ADJzqt/__main__.py\", line 747, in run_module\r\n File \"/tmp/ansible_blivet_payload_ADJzqt/__main__.py\", line 568, in manage_volume\r\n File \"/tmp/ansible_blivet_payload_ADJzqt/__main__.py\", line 230, in manage\r\n File \"/tmp/ansible_blivet_payload_ADJzqt/__main__.py\", line 149, in _look_up_device\r\n File \"/tmp/ansible_blivet_payload_ADJzqt/__main__.py\", line 283, in _get_device_id\r\nAttributeError: 'NoneType' object has no attribute '_disks'\r\n", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error", "rc": 1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, maybe i have to change volumes to pools, partition is a pool type, not a volume type

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is both a pool type and a volume type, much like lvm.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dwlehman but

          vars:
            storage_volumes:
              - name: test1
                type: partition

did not work, see the error above, I had to change to the version that is there now:

      vars:
        storage_pools:
          - name: foo
            type: partition
            disks: "{{ unused_disks }}"
            state: absent

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I didn't look closely enough at the context. There is indeed a 'partition' volume type but, just like the 'lvm' volume type, such volumes must be defined within a pool of the same type.

name: storage
vars:
storage_volumes:
- name: test1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dwlehman Should be foo, no? Not test1

- name: test1
type: disk
disks: "{{ unused_disks }}"
present: false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a present option? I would suppose that this would be handled via state (present or absent).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This commit da350a9 in PR #54 hopefully fixes the cleanup issues.

- __storage_mounts_before_packages.ansible_facts.ansible_mounts |
count !=
__storage_mounts_after_packages.ansible_facts.ansible_mounts | count

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dwlehman shouldn't we make it a separate test rather than asserting every time the role is used? This commit was intended to quickly illustrate the problem, but maybe it is not ideal to keep it in the long term.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird, I tried to leave that one out for that reason. Do you know which commit it came with? Maybe one of the merge commits?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be this:
Merge remote-tracking branch 'pcahyna/mount-assert' into non-destructive
57dd5cc

@tyll
Copy link
Member

tyll commented Oct 31, 2019

IMHO there are too many changes/commits in this PR to have a meaningful review. Especially non-controversial commits like changing classes to inherit from object would be better suited in an individual PR that could already be merged.

@tyll tyll mentioned this pull request Oct 31, 2019
@pcahyna
Copy link
Member

pcahyna commented Oct 31, 2019

well this change of class is on top of changes in this PR, it is not independent. If we want to simplify this PR, the change should be rather squashed with the one that introduces the problem. I spent about a week testing changes in this PR, so I believe it is OK.

@dwlehman
Copy link
Collaborator Author

dwlehman commented Nov 5, 2019

@tyll is this better now that it's been squashed to seven commits?

Copy link
Member

@tyll tyll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some observations

that: "{{ blivet_output.failed and
blivet_output.msg|regex_search('cannot remove existing formatting and/or devices on disk.*in safe mode') and
not blivet_output.changed }}"
msg: "Unexpected behavior w/ existing data on specified disks"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO abbreviations like w/ should be avoided to keep this easier to read.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are only the output of integration tests, but okay.

- name: Verify the output
assert:
that: "{{ not blivet_output.failed and blivet_output.changed }}"
msg: "Unexpected behavior w/ existing data on specified disks"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO abbreviations like w/ should be avoided to keep this easier to read.

that: "{{ blivet_output.failed and
blivet_output.msg|regex_search('cannot remove existing formatting on volume.*in safe mode') and
not blivet_output.changed }}"
msg: "Unexpected behavior w/ existing data on specified disks"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO abbreviations like w/ should be avoided to keep this easier to read.

@@ -31,6 +31,9 @@
disklabel_type:
description:
- disklabel type string (eg: 'gpt') to use, overriding the built-in logic in blivet
safe_mode:
description:
- boolean indicating that we should fail rather than implicitly/automatically removing devices or formatting
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: The lines are so long that GitHub requires horizontal scrolling to be able to read the full line. Seems like there is space for about 118 characters. Ideally the lines would be at most 88 characters IMHO.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This requirement is arbitrary at best and silly at worst. How did you arrive at 88 chars as a max? It was once 65 or 68, then it was 70-something, then 80, now 88? Most displays are 1200px or wider, so I don't understand the need to limit line lengths like we did when there was a fixed console width, but I also don't want to spend too much time arguing about it.

Copy link
Member

@tyll tyll Nov 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 88 character suggestion comes from the black formatter https://github.com/psf/black#line-length - to me the arguments make sense. I do not want to argue about it, I just use black.

Regarding the display size - I have a 3840 px display, however as I wrote GitHub does not use all the space to show your long line but requires vertical scrolling. This makes it painful to read these lines. Also it is best-practice in the analog world to wrap lines at a certain point to make them easier to read, see newspapers and journals, because of the width of the human vision.

Now you have this information it is up to you if you want to make it unpleasant to look at your patches or make it a pleasant experience. I do not intend to force you to do anything.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me, the point of being unpleasantly (or oddly) long is around 120 characters.

I do not have any interest in changing my code (or code comments) to make the github webui experience better, but I still take your point about there being a limit somewhere.

def _look_up_device(self):
super(BlivetDiskVolume, self)._look_up_device()
if not self._get_device_id():
# FAIL: no disks specified for volume
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment does not add extra information and could be removed. The next line says the same.

if not disk.isleaf:
self._blivet.devicetree.recursive_remove(disk)
if not disk.isleaf or disk.format.type is not None:
if not safe_mode:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The negation can be avoided by switching the statements to simplify the code:

if safe_mode:
    raise ...
else:
    self._blivet...

@@ -490,7 +513,10 @@ def _look_up_device(self):
def _create(self):
if self._device.format.type != "disklabel" or \
self._device.format.label_type != disklabel_type:
self._blivet.devicetree.recursive_remove(self._device, remove_device=False)
if not safe_mode:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The negation can be avoided here as well.

@@ -660,6 +686,7 @@ def run_module():
volumes=dict(type='list'),
packages_only=dict(type='bool', required=False, default=False),
disklabel_type=dict(type='str', required=False, default=None),
safe_mode=dict(type='bool', required=False, default=False),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the default be True here for safety? The documentation implies that True is the default and it is implemented via the role variables only.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it should. Good catch.

- ansible_failed_task.name != 'UNREACH'
msg: "Role has not failed when it should have"
vars:
# Ugh! needed to expand ansible_failed_task
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment seems to indicate that there is still something that needs to be done but to me it is unclear what is meant.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pcahyna can you answer this one?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a workaround for a silly behavior in Ansible. The problem is, ansible_failed_task.name references the storage_provider variable which is not defined anymore at this point. But variables are evaluated lazily, so by evaluating ansible_failed_task.name here one crashes Ansible with an undefined variable reference. It should be done better, but I don't know how. (True, the comment should describe the problem better so it is not lost.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here are references to Ansible bug reports: ansible/ansible#49942 ansible/ansible#57399

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tyll For the record, #88 avoids the problem. linux-system-roles/network#200 shows a different approach.

@@ -120,6 +120,7 @@
use_partitions = None # create partitions on pool backing device disks?
disklabel_type = None # user-specified disklabel type
safe_mode = None # do not remove any existing devices or formatting
packages_only = None # only set things up enough to get a list of required packages
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This commit does not seem to fit in this PR and it seems that the added functionality is not used or tested. Not sure what it is doing here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can move this out.

@tyll
Copy link
Member

tyll commented Nov 6, 2019

@tyll is this better now that it's been squashed to seven commits?

AFAICS, it seems that the first commit is the primary reason for this PR. IMHO the other commits that do not really relate to the change and do not need any more discussion (like using the new-style classes) should be moved to its own PR and already be merged.

Squashing the test fixes does not help to properly review them because now the commit messages do not relate to the changes.

Too much change at once, especially when it also requires discussion, makes this too confusing or too hard to keep track of everything.

@pcahyna
Copy link
Member

pcahyna commented Nov 6, 2019

@tyll is this better now that it's been squashed to seven commits?

AFAICS, it seems that the first commit is the primary reason for this PR. IMHO the other commits that do not really relate to the change and do not need any more discussion (like using the new-style classes) should be moved to its own PR and already be merged.

You are probably missing that without the new-style class fix, the code in the first commit simply does not work (399aee2#diff-bdb43621871ab82b00fbc3e2dcc1c681R265 ). I could remove the other new-stylle class changes and keep just the one required to make the PR work though.

Without my other changes tests do not test anything and it is wrong to merge a pull request with falsely passing tests IMO.

@dwlehman dwlehman force-pushed the non-destructive branch 4 times, most recently from f8f0774 to f1d2b7e Compare November 14, 2019 19:53
dwlehman and others added 7 commits November 14, 2019 16:15
-
Use correct array syntax for an invalid disk

While both before and after it should trigger an error in the role,
and therefore both are valid tests, a valid array with a nonexistent disk
is more likely to be what was intended, and therefore preferable.
(The former variant actually checked for bad syntax, not for handling
of nonexistent disks.)

-
Test that the role fails, not that it sets blivet_output.failed

The latter would be a valid test as well, but a less important one, and
currently it does not work properly.

Do not use ignore_errors: yes on the test block, it makes all assert
pointless. Use a rescue task instead.

-
Create a file in the test fs to actually detect data loss

-
Do not use ignore_errors: yes in a test block

This makes all test asserts pointless.

Instead catch the error using rescue.

Actually verify that data heve not been destroyed.

-
Remove last instance of ignore_errors: yes

and check for errors properly. Verify again that data have not been lost.

Do not use a pool type of partition, seems to be unimplemented.

Create one volume in the pool, the role does not cope with empty pools.

Note that even with those changes the role application fails -
apparently it refuses to create a VG when there is already a filesystem,
even with safe mode off.

-
Turn off safe mode in cleanup

-
Adjust names of tests to match reality.
-
Check for errors properly in LVM error test, part 1

Remove ignore_errors in non-safe-mode tests and instead catch errors. Comment
out the currently broken tests.

-
Check safe_mode properly in the LVM tests

Do not use ignore_errors, catch the error instead.

-
Disable the resizing test, resizing does not seem to work
at all currently.

-
Remove the explicit setting of storage_safe_mode to true

True should be the default.

-
Safe mode should now be the default, account for this

-
Check that safe mode works even when the filesystem is unmounted.

-
Verify that replacing a fs by a LVM pool fails in safe mode

-
Cleanup properly.
@dwlehman dwlehman merged commit 822a07d into linux-system-roles:master Nov 15, 2019
@dwlehman dwlehman deleted the non-destructive branch November 15, 2019 15:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants